Opening Files

Supported File Types

PyMuPDF can open files other than just PDF.

The following file types are supported:

PDF XPS EPUB MOBI FB2 CBZ SVG TXT
JPG/JPEG, PNG, BMP, GIF, TIFF, PNM, PGM, PBM, PPM, PAM, JXR, JPX/JP2, PSD
JPG/JPEG, PNG, PNM, PGM, PBM, PPM, PAM, PSD, PS

How to Open a File

To open a file, do the following:

doc = pymupdf.open("a.pdf")

Note

The above creates a Document. The instruction doc = pymupdf.Document("a.pdf") does exactly the same. So, open is just a convenient alias and you can find its full API documented in that chapter.

Opening with a Wrong File Extension

If you have a document with a wrong file extension for its type, you can still correctly open it.

Assume that “some.file” is actually an XPS. Open it like so:

doc = pymupdf.open("some.file", filetype="xps")

Note

PyMuPDF itself does not try to determine the file type from the file contents. You are responsible for supplying the file type information in some way – either implicitly, via the file extension, or explicitly as shown with the filetype parameter. There are pure Python packages like filetype that help you doing this. Also consult the Document chapter for a full description.

If PyMuPDF encounters a file with an unknown / missing extension, it will try to open it as a PDF. So in these cases there is no need for additional precautions. Similarly, for memory documents, you can just specify doc=pymupdf.open(stream=mem_area) to open it as a PDF document.

If you attempt to open an unsupported file then PyMuPDF will throw a file data error.


Opening Remote Files

For remote files on a server (i.e. non-local files), you will need to stream the file data to PyMuPDF.

For example use the requests library as follows:

import pymupdf
import requests

r = requests.get('https://mupdf.com/docs/mupdf_explored.pdf')
data = r.content
doc = pymupdf.Document(stream=data)

Opening Files from Cloud Services

For further examples which deal with files held on typical cloud services please see these Cloud Interactions code snippets.


Opening Files as Text

PyMuPDF has the capability to open any plain text file as a document. In order to do this you should provide the filetype parameter for the pymupdf.open function as "txt".

doc = pymupdf.open("my_program.py", filetype="txt")

In this way you are able to open a variety of file types and perform the typical non-PDF specific features like text searching, text extracting and page rendering. Obviously, once you have rendered your txt content, then saving as PDF or merging with other PDF files is no problem.

Examples

Opening a C# file

doc = pymupdf.open("MyClass.cs", filetype="txt")

Opening an XML file

doc = pymupdf.open("my_data.xml", filetype="txt")

Opening a JSON file

doc = pymupdf.open("more_of_my_data.json", filetype="txt")

And so on!

As you can imagine many text based file formats can be very simply opened and interpreted by PyMuPDF. This can make data analysis and extraction for a wide range of previously unavailable files suddenly possible.


This software is provided AS-IS with no warranty, either express or implied. This software is distributed under license and may not be copied, modified or distributed except as expressly authorized under the terms of that license. Refer to licensing information at artifex.com or contact Artifex Software Inc., 39 Mesa Street, Suite 108A, San Francisco CA 94129, United States for further information.